Write buffer for use in a data processing apparatus

ABSTRACT

The present invention provides a data processing apparatus comprising a processor core for generating addresses identifying locations in a memory and data values for storing in the memory, and a write buffer for storing the addresses and data values output by the processor core, and for subsequently outputting said addresses and data values to cause the data values to be stored in said memory. The write buffer comprises a plurality of rows, each row being arranged to store an address or data value, and each row having associated therewith a flag field settable to indicate whether that row contains an address or a data value.  
     In accordance with the present invention, the write buffer provided by the data processing apparatus adaptively adjusts the number of rows it requires for addresses, and hence can be arranged to occupy a relatively small area, whilst still efficiently supporting both burst mode and non-burst mode write traffic.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates to a data processing apparatus forbuffering addresses identifying locations in a memory, and data valuesto be written to those memory locations. The term ‘data value’ is usedherein to refer to both instructions and to items or blocks of data,such as data words.

[0003] 2. Description of the Prior Art

[0004] A typical data processing apparatus includes a processor core (orCPU) arranged to execute a sequence of instructions that are applied todata supplied to the processor core. Generally, a memory may be providedfor storing the instructions and data (collectively referred to hereinas “data values”) required by the processor core. Further, it is oftenthe case that one or more caches are provided for storing data valuesrequired by the processor core, so as to reduce the number of accessesrequired to the memory.

[0005] Whilst the use of a cache improves the processing speed of theprocessor core, there is still the requirement for the processor core toread data values from, and write data values to, the memory, and theseprocesses are relatively slow, thereby adversely affecting theprocessing speed of the processor core.

[0006] To alleviate the impact on processing speed resulting fromwriting data values to a memory, it is known to provide a write bufferthat is typically arranged to decouple a cached CPU from the memory, soas to allow the processor bus to complete a write operation to theintermediate write buffer, and for that write buffer to thenautonomously perform the write to the memory bus. By this approach, theCPU does not need to wait for the write process to complete beforeproceeding to execute the next instruction. Further, the write bufferdepth can be increased beyond a single register to enable a plurality ofCPU data writes to be buffered, for example by using aFirst-In-First-Out (FIFO) buffer to maintain write transaction ordering.

[0007] In general terms, a write buffer presents a “slave” interface toa “master” at its input side, and presents an “initiator bus” interfaceto the memory bus on its output side. The slave interface generallyrequires address (a), control (c) and write data (d) signals. Thecontrol signal will typically include control information such asoperand size, protection and access flags. The master interface, forexample the interface between the CPU and the processor bus, similarlymust source the same address, control and write data information, andmay additionally perform funnelling to narrower or wider data bus width.

[0008] In a simple prior art write buffer, the slave interface of thewrite buffer will have a width of “a+c+d” bits (for address, control anddata bus widths). In such an arrangement, the write buffer storagerequirements are:

[0009] a+c+d bits wide×number of write buffer slots.

[0010] Generally, when developing data processing apparatus, such asintegrated circuits, there is a desire to keep the circuit as small aspossible. The space that an integrated circuit occupies is at a premium.The smaller an integrated circuit is, the less expensive it will be tomanufacture and the higher the manufacturing yield. For this reason, itis clear that the number of write buffer slots provided within the writebuffer cannot be increased at will, as the overall size of theintegrated circuit must be kept as small as possible.

[0011] Whenever the write buffer fills to capacity, the processor stallson a subsequent write operation until a free slot in the write bufferbecomes available. The maximum write buffer depth is applicationdependent, and is a trade off between chip area, sustainable burst writebandwidth, and the “latency” of the memory, or secondary, bus where aread transaction is blocked until the write buffer has been emptied.

[0012] For cached processors and higher bandwidth systems, much of thewrite traffic is in the form of “bursts” (i.e. cache line replacementsor stack context saves), where a base address and a fixed or variablenumber of data words are transferred. However, there will stilltypically be some non-burst (eg. 8-bit and 16-bit) accesses (eg.character or “short” data).

[0013] In such arrangements, the area required by the write buffer maybe reduced by separating the address/control paths from the data path soas to provide two logically separate write buffers, one for the addressand control signals, and one for the data signals. Since there willgenerally be less addresses than data values in burst mode operation,then the number of address slots provided in the write buffer can besignificantly less than the data slots provided in the write buffer.However, this saving in area to provide fewer address slots is typicallytraded for more data slots, such that the overall area of the writebuffer is optimized for typical usage.

[0014] Hence, for such burst mode write buffers, the write bufferstorage is:

[0015] a+c bits wide×number of address slots

[0016] d bits wide×number of data slots

[0017] In such an arrangement, an address incrementer is typicallyrequired to re-synthesize the burst addresses as the contents of thewrite buffer are output to memory, and more complex control logic isrequired to interlock the address and data write buffer reconstruction.

[0018] Whilst such an arrangement is clearly advantageous for burst modewrite traffic, if there are any non-burst stores (i.e. byte structureaccess), then the number of address slots becomes a limiting factor,since in this non-burst mode, there will be one address for each dataword.

[0019] Given that many data processing apparatus typically employ bothburst mode and non-burst mode stores to memory, it would be desirable toprovide the data processing apparatus with a write buffer that operatesefficiently for both burst mode write traffic and non-burst mode writetraffic, without having to increase the size of the write buffer withrespect to the size of known prior art write buffers.

SUMMARY OF THE INVENTION

[0020] Viewed from a first aspect, the present invention provides a dataprocessing apparatus comprising: a processor core for generatingaddresses identifying locations in a memory and data values for storingin the memory; a write buffer for storing the addresses and data valuesoutput by the processor core, and for subsequently outputting saidaddresses and data values to cause the data values to be stored in saidmemory; the write buffer comprising a plurality of rows, each row beingarranged to store an address or data value, and each row havingassociated therewith a flag field settable to indicate whether that rowcontains an address or a data value.

[0021] In accordance with the present invention, each row of the writebuffer is able to store either an address or a data value, an additionalflag field is associated with each row, and the flag field is settableto indicate whether that row contains an address or a data value. Hence,in burst-mode, a particular row will be used to store the base address,with the flag field for that row being set accordingly to indicate thatan address is contained within that row, and then subsequently the datavalues forming the burst traffic will be stored in other rows of thewrite buffer, with the flag fields of those rows being set to indicatethat data values are contained within those rows. This approach makesvery efficient use of the available write buffer area when bufferingburst mode write traffic.

[0022] However, it is clear that the arrangement of the presentinvention also supports non-burst write traffic, where the rows of thewrite buffer will alternately store addresses and data values, with theflag fields for each row being set accordingly.

[0023] It has been found that a write buffer in accordance with thepresent invention can be arranged to occupy a relatively small area,whilst providing a good compromise between a write buffer optimized fornon-burst mode traffic, and a write buffer optimized for burst modetraffic.

[0024] In preferred embodiments, each row comprises ‘n’ bits and theflag field comprises one or more of said ‘n’ bits. Preferably, said flagfield comprises a single bit, since this keeps the space required forthe flag field to a minimum whilst ensuring that sufficient informationis still provided to determine whether any particular row contains anaddress or a data value.

[0025] In preferred embodiments, the data processing apparatus furthercomprises a multiplexer for receiving said addresses and data valuesfrom the processor core; and input control logic for controlling themultiplexer to output either a data value or an address to the writebuffer for storage in a particular row; the input control logic furthercontrolling the write buffer to set the flag field for that particularrow to indicate whether that row has an address or a data value storedtherein.

[0026] Further, in preferred embodiments, each row further comprises acontrol field, wherein if an address is stored in a particular row, thenthe control field of that row is used to store control data associatedwith the address. Hence, in this arrangement, the input control logicwill cause the multiplexer to output the address for storing within theparticular row, and also the control data for storing within the controlfield of that row, with the flag field being set to indicate that thatparticular row contains an address.

[0027] Preferably, if a data value is stored in a particular row, thenthe control field is used to store mask data identifying the region orregions of that row containing data. Hence, the control field is stillused, even if the row is being used to store a data value rather than anaddress. In preferred embodiments, a plurality of bytes in the row arereserved for storing the data value, and the mask data indicates whichof said plurality of bytes contain the data value. Hence, if the writebuffer is connected to a 32-bit data bus, such that a data word can beup to four bytes long, then four bytes will be reserved for storing thedata value in each row. However, if the data value to be stored in aparticular row is less than four bytes in length, then not all of thefour bytes in the row will be used to store the data value. In thisinstance, the mask data is used to indicate which of the plurality ofbytes in the row do contain the data value. In preferred embodiments,the input control logic is arranged to control the write buffer togenerate the mask data.

[0028] Further, in preferred embodiments, the data processing apparatuscomprises output control logic for controlling the output to the memoryof the addresses and data values stored in the write buffer. Preferably,the data processing apparatus comprises a demultiplexer for receivingthe contents of a row of the write buffer, the output control logicbeing arranged to determine from the flag field whether an address or adata value is included in the row, and to instruct the demultiplexer tooutput a data value onto a data line or an address onto an address line.The input and output control logic may be provided by separate logiccomponents, but in preferred embodiments are provided by the same logiccomponent.

[0029] In preferred embodiments, any burst mode stores in the writebuffer are resynthesized before passing on to the memory bus. Hence, inpreferred embodiments, the data processing apparatus further comprisesan incrementer for receiving addresses output on the address line. Thus,if after receiving the address at the incrementer, a plurality of rowsof data values are read out from the write buffer, then each time a datavalue is placed on the memory bus, the address can be incremented by theincrementer, and the corresponding incremented address output on to theaddress bus of the memory bus. In this way, the memory will receive thenecessary address information to enable it to store each data valuereceived.

[0030] In preferred embodiments, the demultiplexer is arranged to outputonto a control line control data within the row received from the writebuffer, and the data processing apparatus further comprises a registerfor storing the control data. In preferred embodiments, the control datawill be output each time a row of the write buffer containing a datavalue is output on to the memory bus. By storing the control data in aregister, this information can be output on to the control bus of thememory bus as required.

[0031] In preferred embodiments, the write buffer is aFirst-In-First-Out (FIFO) buffer, since this ensures that writetransaction ordering is maintained.

[0032] Viewed from a second aspect, the present invention provides awrite buffer for storing addresses identifying locations in a memory anddata values for storing in the memory, and for subsequently outputtingsaid addresses and data values to cause the data values to be stored insaid memory, the write buffer comprising: a plurality of rows, each rowbeing arranged to store an address or data value, and each row havingassociated therewith a flag field settable to indicate whether that rowcontains an address or a data value.

BRIEF DESCRIPTION OF THE DRAWINGS

[0033] An embodiment of the invention will be described hereinafter, byway of example only, with reference to the accompanying drawings inwhich like reference signs are used for like features, and in which:

[0034]FIG. 1 is a block diagram illustrating a data processing apparatusin accordance with the preferred embodiment of the present invention;

[0035]FIG. 2 is a block diagram illustrating the logic provided to storeaddress, data and control signals in the write buffer of preferredembodiments of the present invention, and subsequently to read andresynchronise the address, data and control signals for outputting to amemory; and

[0036]FIG. 3 illustrates the structure of the adaptive write buffer ofpreferred embodiments of the present invention.

DESCRIPTION OF A PREFERRED EMBODIMENT

[0037] A data processing circuit in accordance with the preferredembodiment of the present invention will be described with reference tothe block diagram of FIG. 1. As shown in FIG. 1, the data processingcircuit has a processor core 10 arranged to process instructionsreceived from memory 120. Data required by the processor core 10 forperforming those instructions may also be retrieved from memory 120. Acache 30 is provided for storing data and instructions retrieved fromthe memory 120 so that it is subsequently readily accessible by theprocessor core 10. The cache control unit 40 is also provided to controlthe storage of instructions and data in the cache 30, and to control theretrieval of the data and instructions from the cache.

[0038] When the processor core 10 requires an instruction or an item ofdata (hereafter instructions or data will both be referred to as datavalues), it places the memory address of that data value on bus line 54of processor bus 50. Further, the processor core 10 issues a processorcontrol signal on bus line 52. The processor control signal includesinformation such as whether the address corresponds to a read or a writerequest, the type of access (eg. sequential), the size of the access(eg. word, byte), the operating mode of the processor (eg. supervisor oruser), etc. This processor control signal is received by the cachecontrol unit 40 and prompts the cache control unit to determine whetherthe required data value is stored within the cache 30. The cache controlunit 40 instructs the cache 30 to compare the address on bus line 54with the addresses in the cache to determine whether the data valuecorresponding to that address is stored within the cache. If so, thedata value is output from the cache 30 onto the data bus line 56 whereit is then read by the processor core 10. If the data valuecorresponding to the address is not within the cache 30, then the cachecontrol unit 40 passes a signal over line 130 to the bus interface unit(BIU) 95 to indicate that the data value needs to be retrieved frommemory 120.

[0039] Whilst this cache look up process is taking place, the memorymanagement unit (MMU) 20 also receives the processor control signal onbus line 52, and upon determining that the processor control signalrelates to a potential read or write access to memory 120 or cache 30,is arranged to examine the address placed by the processor core 10 onbus line 54.

[0040] Different areas of the memory 120 may be used to store datavalues having different attributes, such as protection, cacheable andbufferable attributes. Hence, the MMU 20 is arranged to determine fromthe address the attributes used to control access to the memory 120 oruse of the data values retrieved from the cache 30. These attributes arethen passed to the BIU 95.

[0041] As mentioned earlier, the MMU 20 receives the processor controlsignal from bus line 52, this processor control signal defining, amongstother things, the mode of operation of the processor core 10. Hence thisinformation can be used by the MMU 20 to determine whether theattributes determined from the address allow the processor core 10 inits current mode of operation to have access to the memory addressrequested. For example, if the processor control signal indicates thatthe processor core 10 is in a user mode, and the attributes determinedfrom the address indicate that the memory address can only be accessedin supervisor mode, then the MMU 20 can be arranged to produce an abortsignal on path 140 to the processor core 10 and on path 170 to the BusInterface Unit 95.

[0042] The processing performed by the MMU 20 preferably happens at thesame time as the cache look up process so as to maintain sufficientprocessing speed. If the data value requested is available in the cache30, and the MMU 20 does not produce an abort signal on lines 140, 170then the processor core 10 will use the data retrieved from cache 30.However, if the data value requested is not available in cache, then, asdiscussed earlier, a signal will be sent over path 130 instructing theBus Interface Unit (BIU) 95 to access the memory 120 for the data value.

[0043] The BIU 95 will examine the processor control signal on bus line52 to determine whether the instruction issued by the processor core 10is a read or a write instruction. Assuming it is a read instruction, andthat no abort signal is received over path 170 from the logic 90, thenthe BIU 95 will instruct the multiplexer 100 to pass the address frombus line 54 on to the external address bus line 64 of bus 60 (this isassuming that no pending write instructions to memory 120 are pending inthe write buffer 105—if there are any such pending write instructions,these will be completed prior to the read instruction. The action of thewrite buffer is discussed in more detail later). A control signal willalso be placed on bus line 62 which is used by memory controller 180 tocontrol access to the memory 120. The memory controller 180 willdetermine from the control signal on bus line 62 that a memory read isrequired, and will instruct the memory to output on the data bus line 66the data at the address indicated on address bus line 64.

[0044] The BIU 95 will send a signal to buffer 110 to cause the buffer110 to pass the data placed by the memory 120 on external bus line 66 tothe processor bus line 56. Additionally, if the attributes received bythe BIU 95 from the MMU 20 indicate that the address contains acacheable data value, then the BIU 95 will send a signal over path 135to the cache control 40 to instruct the cache control to store theretrieved data value in cache 30. The data value retrieved from thememory 120 and placed on bus line 56 will then be stored in the cache 30and also passed to the processor core 10. Subsequently, that data valuecan readily be accessed by the processor core 10 directly from thecache. If the attributes received by the BIU 95 indicate that the datavalue is not cacheable, then the data will not be stored in cache, andthe processor core 10 will read the data value from bus line 56.

[0045] The above description has illustrated how the MMU 20 is used tocontrol access to the memory 120 for the purposes of reading data valuesfrom the memory 120. In the event that the address issued by theprocessor core 10 is an address to which the processor wishes to write adata value, then the following procedure takes place.

[0046] The processor core will place a processor control signal on busline 52, an address on bus line 54, and the data value to be stored onbus line 56. The MMU 20 will examine the processor control signal on busline 52, and upon determining that the processor control signal relatesto a write access to memory 120, will examine the address placed by theprocessor core 10 on bus line 54. The attributes associated with thataddress will then be output to the BIU 95.

[0047] The BIU 95 will examine the processor control signal on bus line52 to determine whether the instruction issued by the processor core 10is a read or a write instruction. Assuming it is a write instruction,the BIU will determine that a write procedure needs to be employed, andwill use the attribute information received from the MMU 20 to controlthat write procedure.

[0048] The MMU 20 will also have determined from the attributes and fromthe processor control signal whether the processor core is able to writeto the particular address in its current mode of operation, and if not,will have issued an abort signal. Any abort signal will be sent to theBIU 95 over path 170 to instruct it to disregard the write instruction,and will also be sent to the processor core 10 over path 140 to causethe data, address and control information to be removed from bus lines56, 54 and 52, respectively, and to enable the processor core 10 toexecute any exception procedure required in the event of such an abort.

[0049] However, assuming the processor core is entitled to write to theaddress placed on bus line 54, and hence no abort signal is received bythe BIU 95, then the BIU 95 will use the attribute information from theMMU 20 to determine whether the data to be written is bufferable or not.If the data is bufferable, then the BIU 95 will instruct the writebuffer 105 to retrieve the data, address and control signals from bus50. Once this has been done, the next instruction can be processed bythe processor core 10 without waiting for the write instruction to havebeen completed.

[0050] The write buffer is preferably a FIFO buffer. When the externalbus 60 is free, the BIU 95 instructs the multiplexer 100 to output thenext item from the write buffer onto the external bus 60. Themultiplexer 100 will then output the necessary control, address and datasignals on bus lines 62, 64 and 66 respectively, the memory controller180 using the control signal to control the write access to memory 120.At this point, the data will be stored in the memory 120. As the data tobe stored is sequentially processed from the write buffer 105, then atsome point the data corresponding to the address issued by the processoron bus line 54 will be stored in the memory 120.

[0051] If, however, the Bus Interface Unit 95 determines that theaddress to which the data is to be stored is not bufferable, then theBus Interface Unit 95 will instruct the multiplexer 100 to select theprocessor control, address and data information from bus lines 52, 54and 56 directly. The multiplexer 100 will then output this informationonto the external bus 60 so as to cause the data to be stored at thecorresponding address in memory 120. However, prior to doing this, thewrite buffer 105 would typically be drained of any entries within it, soas to ensure that the write instructions are processed in the correctorder. Once the non bufferable data corresponding to the current writeinstruction has been stored, the next instruction can then be processed.

[0052] The above description of FIG. 1 has provided a general overviewof the operation of a typical data processing apparatus. A more detaileddescription of the operation of the write buffer 105 of preferredembodiments of the present invention will now be described in moredetail with reference to FIGS. 2 and 3.

[0053]FIG. 3 illustrates the structure of the write buffer in preferredembodiments of the present invention. The structure illustrated in FIG.3 is suitable for use with a 32-bit RISC processor connected to aprocessor bus consisting of a 32-bit data bus, a 32-bit address bus, anda 4-bit control bus. Hence, bits 0-31 of each row 310 of the writebuffer are reserved for storing either an address or a data value.Further, bits 32-35 are reserved for storing either a 4-bit control dataassociated with an address stored in that row, or to store a 4-bit datamask associated with a data value stored in that row.

[0054] In preferred embodiments the four bits of control data stored inthose rows containing an address include a 2-bit size field (8, 16, 32,64 bit data transfer width), plus any additional control flags required,such as a privilege (“supervisor”) access indicator.

[0055] In addition to the above mentioned 36 bits, in accordance withpreferred embodiments of the present invention, a single 37th bit isadded to each row to provide a flag field to indicate whether that rowcontains an address, or a data value. In preferred embodiments, a logic“0” value indicates that the row contains an address, whereas a logic“1” value indicates that the row contains a data value. Clearly, themeaning of these logical values could be reversed without departing fromthe present invention, such that a logic 1 value would indicate anaddress and a logic 0 value would indicate a data value.

[0056] In preferred embodiments, as illustrated in FIG. 3, data outputby the processor core is input to the bottom of the write buffer, andthe write buffer is a FIFO buffer, such that the item that has beenstored in the buffer the longest is output first, each row being readout from the top of the buffer as illustrated in FIG. 3.

[0057] In burst mode, a base address, and the corresponding control datawill be stored in a first row of the write buffer, and a logic 0 valuewill be added to the 37th bit to indicate that that row stores anaddress. Then, each data value following the base address is stored in aseparate row of the write buffer, with the 37th bit being set to a logic1 value to indicate that data is contained in that row. Hence, for awrite buffer that is sixteen rows deep, bursts of up to fifteen dataword writes can be stored within the write buffer before the writebuffer becomes full.

[0058] In a non-burst mode, then addresses and data values will bestored alternately in the write buffer, such that a row containing anaddress is followed by a row containing the data value to be stored atthat address. As is clear from FIG. 3, the data value stored in aparticular row can be a data word, in this example the data word being32-bits, or 4 bytes, long. However, alternatively, the data value can be1 byte, 2 bytes or 3 bytes long, often referred to as sub-word-lengthdata values. In such situations, the 4-bit data mask placed in bits32-35 of each row containing a data value is used to identify which ofthe 4 bytes allocated for the data value actually store the data value.Hence, in preferred embodiments, if the data value is a data word, thenall 4-bits of the data mask will be set to a logic “1” value, whereas ifany of the bytes do not contain the data value, then the correspondingbit in the data mask will be set to a logic “0” value.

[0059] From the above description, it will be appreciated that the writebuffer is very flexible, and adapts automatically to store either burstmode write traffic or non-burst mode write traffic. Hence, taking thesixteen row deep FIFO example discussed earlier, burst writes of up tofifteen data words through to non-burst writes of up to eight 1 bytewide stores can be fitted within the FIFO write buffer structure ofpreferred embodiments, which adaptively adjust the number of slots itrequires for addresses.

[0060] It has been found that this adaptive adjustment is very suitablefor the write bandwidth of basic Load and Store RISC processors, whichcan produce burst-mode sustained writes with few addresses for contextand register bank save processes, but also generate fewer byte andhalf-word non-burst store operations with more address information.

[0061] Having reviewed the structure of the write buffer of preferredembodiments with reference to FIG. 3, the operation of the write bufferwill now be discussed in more detail with reference to FIG. 2.

[0062] As is apparent from FIG. 2, the write buffer 105 is separatedfrom the processor bus 50 by a multiplexer 200. As discussed earlierwith reference to FIG. 1, the BIU 95 has access to, amongst otherthings, the control signal on bus line 52, this being indicated by thepath 235 in FIG. 2. Upon determining that the data to be written isbufferable, the BIU 95 will send a signal over path 240 to themultiplexer 200 instructing it to output the control and address data onbus lines 52 and 54 to the write buffer 105. The BIU 95 will alsoinstruct the write buffer 105 over path 255 to store the control andaddress data provided by the multiplexer 200. In addition, the BIU 95will send a signal over path 250 to set the flag field. in preferredembodiments the 37th bit of the relevant row, to a logic “0” value toindicate that the row contains an address.

[0063] The process will then be repeated for the data on the data bus56, with the BIU 95 instructing the multiplexer 200 to output the datato the write buffer 105, and the BIU 95 setting the flag field of therelevant row to a logic “1” value. Further, the BIU 95 will cause thewrite buffer 105 to generate the mask data to be placed in bits 32-35 ofthe row to indicate which of the four bytes in the row allocated for thedata value actually contain the data value.

[0064] If the write operation is a non-burst write, then all thenecessary control, address and data information for that write operationwill now be stored in the write buffer, and the BIU 95 will be arrangedto repeat the above process for each subsequent non-burst writeoperation, assuming that the write operation is bufferable. If, however,the write is a burst mode write, then the BIU 95 will continue toinstruct the multiplexer 200 to output the data on the data bus 56 tothe write buffer 105 for each data word in the burst mode writeoperation. Additionally, the BIU 95 will send a signal over path 250 foreach data word stored in the write buffer in order to set the flag fieldof the corresponding rows to a logic “1” value to indicate that thoserows contain data, and will cause the write buffer to generate thenecessary mask data.

[0065] When data is to be read out of the write buffer 105 for storingin the memory 120, then the BIU 95 will firstly determine the value ofthe flag field for the row of data to be read from the write buffer 105,this value being passed over the path 260 to the BIU 95. Since, inpreferred embodiments, the write buffer 105 is a FIFO write buffer, thenthe data that has been stored in the write buffer the longest will beread out first.

[0066] Once the value of the flag field has been determined by the BIU95, the BIU 95 will send signals over paths 265, 270, 275 and 280 to thewrite buffer 105, the demultiplexer 210, the register 220 and theincrementer 230 to control the output of the data onto the control 62,address 64 and data 66 buses of the external bus 60. In particular, ifthe flag field indicates that the row to be read out from the writebuffer contains an address, then the BIU 95 will instruct the writebuffer 105 to output the row, and will instruct the demultiplexer 210 topass bits 0-31 to the incrementer 230, and bits 32-35 to the register220. Hence, by this approach, the address will be passed to theincrementer 230 and the control data will be passed to the register 220.Both the register 220 and the incrementer 230 will have been instructedto store these values via the signals from the BIU 95 passed over thepaths 270 and 280, respectively.

[0067] The BIU 95 will then determine the value of the flag field forthe next row, this being a logic “1” value to indicate that the rowcontains a data value. It will then instruct the write buffer 105 tooutput the row to the demultiplexer 210, and will instruct thedemultiplexer 210 over the path 275 to output on the data path 285 thedata value stored in those of bits 0-31 identified by the mask data. Atthis time, the register 220 and the incrementer 230 will also output thecontrol and address data on the control path 290 and the address path295, respectively. This data will then be passed to the multiplexer 100(shown in FIG. 1) for outputting onto the external bus 60.

[0068] If the BIU 95 then determines that the next row to be read outfrom the write buffer contains an address, then the above process willbe repeated so that the control and address information are passed tothe register 220 and incrementer 230, respectively, and then the data isoutput on path 285 whilst the control and address information are outputover the paths 290 and 295. However, if the BIU 95 determines that thenext row also includes data, then it will instruct the write buffer 105to output the data to the demultiplexer 210, will instruct thedemultiplexer to output the data in bits 0-31 (as identified by the maskdata) on the data path 285, will instruct the register 220 to output thecontrol data already stored in the register 220 out on the path 290, andwill instruct the incrementer 230 to increment the address and thenoutput the incremented address on the address path 295. By thisapproach, the control, data and address information is re-synthesisedprior to being passed out onto the external bus 60.

[0069] If at any stage, the BIU 95 determines that the write buffer 105is full, and the BIU 95 determines that a further bufferable writeoperation is to be added to the write buffer, then the BIU 95 will issuea wait signal to the processor bus 50 to advise that the write buffer105 is full. How this information is used will be dependent on whichlogical unit is initiating the bufferable write operation. As anexample, as discussed earlier, if the processor core 10 is issuing abufferable write operation, and the write buffer 105 is full, then theprocessor core will stall until a free slot in the write buffer becomesavailable.

[0070] Although a particular embodiment has been described herein, itwill be appreciated that the invention is not limited thereto and thatmany modifications and additions thereto may be made within the scope ofthe invention. For example, various combinations of the features of thefollowing dependent claims could be made with the features of theindependent claims without departing from the scope of the presentinvention.

I claim:
 1. A data processing apparatus comprising: a processor core forgenerating addresses identifying locations in a memory and data valuesfor storing in the memory; a write buffer for storing the addresses anddata values output by the processor core, and for subsequentlyoutputting said addresses and data values to cause the data values to bestored in said memory; the write buffer comprising a plurality of rows,each row being arranged to store an address or data value, and each rowhaving associated therewith a flag field settable to indicate whetherthat row contains an address or a data value.
 2. A data processingapparatus as claimed in claim 1, wherein each row comprises ‘n’ bits andthe flag field comprises one or more of said ‘n’ bits.
 3. A dataprocessing apparatus as claimed in claim 1, wherein said flag fieldcomprises a single bit.
 4. A data processing apparatus as claimed inclaim 1, further comprising: a multiplexer for receiving said addressesand data values from the processor core; and input control logic forcontrolling the multiplexer to output either a data value or an addressto the write buffer for storage in a particular row; the input controllogic further controlling the write buffer to set the flag field forthat particular row to indicate whether that row has an address or adata value stored therein.
 5. A data processing apparatus as claimed inclaim 1, wherein each row further comprises a control field, wherein ifan address is stored in a particular row, then the control field of thatrow is used to store control data associated with the address.
 6. A dataprocessing apparatus as claimed in claim 5, wherein if a data value isstored in a particular row, then the control field is used to store maskdata identifying the region or regions of that row containing data.
 7. Adata processing apparatus as claimed in claim 6, wherein a plurality ofbytes in the row are reserved for storing the data value, and the maskdata indicates which of said plurality of bytes contain the data value.8. A data processing apparatus as claimed in claim 6, furthercomprising: a multiplexer for receiving said addresses and data valuesfrom the processor core; and input control logic for controlling themultiplexer to output either a data value or an address to the writebuffer for storage in a particular row; the input control logic furthercontrolling the write buffer to set the flag field for that particularrow to indicate whether that row has an address or a data value storedtherein; wherein the input control logic is arranged to control thewrite buffer to generate the mask data.
 9. A data processing apparatusas claimed in claim 1, further comprising output control logic forcontrolling the output to the memory of the addresses and data valuesstored in the write buffer.
 10. A data processing apparatus as claimedin claim 9, further comprising a demultiplexer for receiving thecontents of a row of the write buffer, the output control logic beingarranged to determine from the flag field whether an address or a datavalue is included in the row, and to instruct the demultiplexer tooutput a data value onto a data line or an address onto an address line.11. A data processing apparatus as claimed in claim 10, furthercomprising an incrementer for receiving addresses output on the addressline.
 12. A data processing apparatus as claimed in claim 10, whereineach row further comprises a control field, wherein if an address isstored in a particular row, then the control field of that row is usedto store control data associated with the address, and wherein thedemultiplexer is arranged to output onto a control line control datawithin the row received from the write buffer, and the data processingapparatus further comprises a register for storing the control data. 13.A data processing apparatus as claimed in claim 1, wherein the writebuffer is a First-In-First-Out (FIFO) buffer.
 14. A write buffer forstoring addresses identifying locations in a memory and data values forstoring in the memory, and for subsequently outputting said addressesand data values to cause the data values to be stored in said memory,the write buffer comprising: a plurality of rows, each row beingarranged to store an address or data value, and each row havingassociated therewith a flag field settable to indicate whether that rowcontains an address or a data value.